[AMDGPU][SILoadStoreOptimizer] Fix lds address operand offset by RyanRio · Pull Request #176816 · llvm/llvm-project

RyanRio · 2026-01-19T20:40:13Z

The offset operand in GLOBAL_LOAD_ASYNC_TO_LDS_B128, for instance, is added to both the lds and global address, but SILoadStoreOptimizer is currently unaware of that. This PR inserts an add to counteract the offset meant for the global address. This one add is better than not doing the optimization at all, and having to insert 2 adds for each global address calculation (with no offset).

; ENABLE-LABEL: name: promote_async_load_offset
; ENABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; ENABLE-NEXT: {{  $}}
; ENABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; ENABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; ENABLE-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

; DISABLE-LABEL: name: promote_async_load_offset
; DISABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; DISABLE-NEXT: {{  $}}
; DISABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 256, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 512, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

This PR also promotes the global address to an offset when the offset is calculated with V_ADD_U64 on applicable gfx versions, (and inversely adds the LDS offset), whereas previously the optimization opportunity was missed entirely.

llvmbot · 2026-01-19T20:40:44Z

@llvm/pr-subscribers-backend-amdgpu

Author: Ryan Mitchell (RyanRio)

Changes

The offset operand in GLOBAL_LOAD_ASYNC_TO_LDS_B128, for instance, is added to both the lds and global address, but SILoadStoreOptimizer is currently unaware of that. This PR inserts an add to counteract the offset meant for the global address. This one add is better than not doing the optimization at all, and having to insert 2 adds for each global address calculation (with no offset).

; ENABLE-LABEL: name: promote_async_load_offset
; ENABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; ENABLE-NEXT: {{  $}}
; ENABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; ENABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; ENABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; ENABLE-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; ENABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

; DISABLE-LABEL: name: promote_async_load_offset
; DISABLE: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
; DISABLE-NEXT: {{  $}}
; DISABLE-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 256, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
; DISABLE-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 512, $vgpr0, 0, implicit $exec
; DISABLE-NEXT: renamable $vgpr3, $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
; DISABLE-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)

Full diff: https://github.com/llvm/llvm-project/pull/176816.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInstrInfo.h (+8)
(modified) llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp (+28-2)
(added) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir (+74)

diff --git a/llvm/lib/Target/AMDGPU/SIInstrInfo.h b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
index bd4029ca97014..2d13ee3800731 100644
--- a/llvm/lib/Target/AMDGPU/SIInstrInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIInstrInfo.h
@@ -1020,6 +1020,14 @@ class SIInstrInfo final : public AMDGPUGenInstrInfo {
     return MI.getDesc().TSFlags & SIInstrFlags::LGKM_CNT;
   }
 
+  static bool usesASYNC_CNT(const MachineInstr &MI) {
+    return MI.getDesc().TSFlags & SIInstrFlags::ASYNC_CNT;
+  }
+
+  bool usesASYNC_CNT(uint16_t Opcode) const {
+    return get(Opcode).TSFlags & SIInstrFlags::ASYNC_CNT;
+  }
+
   // Most sopk treat the immediate as a signed 16-bit, however some
   // use it as unsigned.
   static bool sopkIsZext(unsigned Opcode) {
diff --git a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
index dc1592e5fb82a..f50333b5886a7 100644
--- a/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
+++ b/llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
@@ -279,6 +279,7 @@ class SILoadStoreOptimizer {
 
   void updateBaseAndOffset(MachineInstr &I, Register NewBase,
                            int32_t NewOffset) const;
+  void updateAsyncLDSAddress(MachineInstr &MI, int32_t OffsetDiff) const;
   Register computeBase(MachineInstr &MI, const MemAddress &Addr) const;
   MachineOperand createRegOrImm(int32_t Val, MachineInstr &MI) const;
   std::optional<int32_t> extractConstOffset(const MachineOperand &Op) const;
@@ -2311,6 +2312,27 @@ void SILoadStoreOptimizer::processBaseWithConstOffset(const MachineOperand &Base
   Addr.Offset = (*Offset0P & 0x00000000ffffffff) | (Offset1 << 32);
 }
 
+// Maintain the correct LDS address for async loads.
+// It becomes incorrect when promoteConstantOffsetToImm
+// adds an offset only meant for the src operand.
+void SILoadStoreOptimizer::updateAsyncLDSAddress(MachineInstr &MI,
+                                                 int32_t OffsetDiff) const {
+  if (!TII->usesASYNC_CNT(MI) || OffsetDiff == 0)
+    return;
+
+  Register OldVDst = TII->getNamedOperand(MI, AMDGPU::OpName::vdst)->getReg();
+  Register NewVDst =
+      MRI->createVirtualRegister(TRI->getRegClassForReg(*MRI, OldVDst));
+  MachineBasicBlock &MBB = *MI.getParent();
+  DebugLoc DL = MI.getDebugLoc();
+  BuildMI(MBB, MI, DL, TII->get(AMDGPU::V_ADD_U32_e64), NewVDst)
+      .addReg(OldVDst)
+      .addImm(-OffsetDiff)
+      .addImm(0);
+
+  MI.getOperand(0).setReg(NewVDst);
+}
+
 bool SILoadStoreOptimizer::promoteConstantOffsetToImm(
     MachineInstr &MI,
     MemInfoMap &Visited,
@@ -2440,7 +2462,9 @@ bool SILoadStoreOptimizer::promoteConstantOffsetToImm(
     // Instead of moving up, just re-compute anchor-instruction's base address.
     Register Base = computeBase(MI, AnchorAddr);
 
-    updateBaseAndOffset(MI, Base, MAddr.Offset - AnchorAddr.Offset);
+    int32_t OffsetDiff = MAddr.Offset - AnchorAddr.Offset;
+    updateBaseAndOffset(MI, Base, OffsetDiff);
+    updateAsyncLDSAddress(MI, OffsetDiff);
     LLVM_DEBUG(dbgs() << "  After promotion: "; MI.dump(););
 
     for (auto [OtherMI, OtherOffset] : InstsWCommonBase) {
@@ -2451,7 +2475,9 @@ bool SILoadStoreOptimizer::promoteConstantOffsetToImm(
       if (TLI->isLegalFlatAddressingMode(AM, AS)) {
         LLVM_DEBUG(dbgs() << "  Promote Offset(" << OtherOffset; dbgs() << ")";
                    OtherMI->dump());
-        updateBaseAndOffset(*OtherMI, Base, OtherOffset - AnchorAddr.Offset);
+        int32_t OtherOffsetDiff = OtherOffset - AnchorAddr.Offset;
+        updateBaseAndOffset(*OtherMI, Base, OtherOffsetDiff);
+        updateAsyncLDSAddress(*OtherMI, OtherOffsetDiff);
         LLVM_DEBUG(dbgs() << "     After promotion: "; OtherMI->dump());
       }
     }
diff --git a/llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir b/llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir
new file mode 100644
index 0000000000000..3f243f39fa585
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm-gfx12.mir
@@ -0,0 +1,74 @@
+# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
+# RUN: llc -mtriple=amdgcn -mcpu=gfx1250 -verify-machineinstrs -start-before=si-load-store-opt -stop-before=si-insert-waitcnts --verify-machineinstrs -o - %s | FileCheck -check-prefixes=GFX1250 %s
+
+---
+name: promote_async_load_offset
+machineFunctionInfo:
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $vgpr0, $sgpr0_sgpr1, $ttmp7
+    ; GFX1250-LABEL: name: promote_async_load_offset
+    ; GFX1250: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
+    ; GFX1250-NEXT: {{  $}}
+    ; GFX1250-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 512, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr0 = V_ADD_U32_e32 256, $vgpr1, implicit $exec
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -256, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %0:vgpr_32(s32) = COPY $vgpr0
+    %5:vgpr_32 = V_LSHLREV_B32_e64 8, %0:vgpr_32(s32), implicit $exec
+    %7:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %8:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %32:vgpr_32, %34:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 256, 0, implicit $exec
+    %33:vgpr_32, %35:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %34:sreg_32_xm0_xexec, 0, implicit $exec
+    %17:vreg_64_align2 = REG_SEQUENCE %32:vgpr_32, %subreg.sub0, %33:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %7:vgpr_32, killed %17:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %40:vgpr_32, %36:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 512, 0, implicit $exec
+    %41:vgpr_32, %37:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %36:sreg_32_xm0_xexec, 0, implicit $exec
+    %21:vreg_64_align2 = REG_SEQUENCE %40:vgpr_32, %subreg.sub0, %41:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %8:vgpr_32, killed %21:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+...
+
+---
+name: promote_async_load_offset_two_offsets
+machineFunctionInfo:
+  stackPtrOffsetReg: '$sgpr32'
+  frameOffsetReg: '$sgpr33'
+tracksRegLiveness: true
+body:             |
+  bb.0.entry:
+    liveins: $vgpr0, $sgpr0_sgpr1, $ttmp7
+    ; GFX1250-LABEL: name: promote_async_load_offset_two_offsets
+    ; GFX1250: liveins: $ttmp7, $vgpr0, $sgpr0_sgpr1
+    ; GFX1250-NEXT: {{  $}}
+    ; GFX1250-NEXT: renamable $vgpr1 = V_LSHLREV_B32_e32 8, $vgpr0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr2, renamable $vcc_lo = V_ADD_CO_U32_e64 $vgpr0, 516, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr3, dead $sgpr_null = V_ADDC_U32_e64 0, killed $vgpr0, killed $vcc_lo, 0, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr1 = disjoint V_OR_B32_e32 0, killed $vgpr1, implicit $exec
+    ; GFX1250-NEXT: renamable $vgpr0 = V_ADD_U32_e32 260, $vgpr1, implicit $exec
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -260, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    ; GFX1250-NEXT: renamable $vgpr0 = V_ADD_U32_e32 4, $vgpr1, implicit $exec
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr0, $vgpr2_vgpr3, -4, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    ; GFX1250-NEXT: GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed $vgpr1, killed $vgpr2_vgpr3, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %0:vgpr_32(s32) = COPY $vgpr0
+    %5:vgpr_32 = V_LSHLREV_B32_e64 8, %0:vgpr_32(s32), implicit $exec
+    %7:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %8:vgpr_32 = disjoint V_OR_B32_e64 %5:vgpr_32, 0, implicit $exec
+    %32:vgpr_32, %34:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 256, 0, implicit $exec
+    %33:vgpr_32, %35:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %34:sreg_32_xm0_xexec, 0, implicit $exec
+    %17:vreg_64_align2 = REG_SEQUENCE %32:vgpr_32, %subreg.sub0, %33:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %7:vgpr_32, killed %17:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %40:vgpr_32, %36:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 512, 0, implicit $exec
+    %41:vgpr_32, %37:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %36:sreg_32_xm0_xexec, 0, implicit $exec
+    %21:vreg_64_align2 = REG_SEQUENCE %40:vgpr_32, %subreg.sub0, %41:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 %8:vgpr_32, killed %21:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+    %42:vgpr_32, %38:sreg_32_xm0_xexec = V_ADD_CO_U32_e64 %0:vgpr_32, 516, 0, implicit $exec
+    %43:vgpr_32, %39:sreg_32_xm0_xexec = V_ADDC_U32_e64 %0:vgpr_32, 0, killed %38:sreg_32_xm0_xexec, 0, implicit $exec
+    %22:vreg_64_align2 = REG_SEQUENCE %42:vgpr_32, %subreg.sub0, %43:vgpr_32, %subreg.sub1
+    GLOBAL_LOAD_ASYNC_TO_LDS_B128 killed %8:vgpr_32, killed %22:vreg_64_align2, 0, 0, implicit-def $asynccnt, implicit $exec, implicit $asynccnt :: (load store (s128), align 1, addrspace 3)
+...

RyanRio · 2026-01-19T20:40:50Z

@nhaehnle @Sisyph

arsenm

Can you also include an end to end ir test

arsenm · 2026-01-21T10:55:26Z

+    return std::nullopt;
+
+  // Handle S_MOV_B64_IMM_PSEUDO for 64-bit immediates
+  if (Def->getOpcode() == AMDGPU::S_MOV_B64_IMM_PSEUDO)


Use getConstValDefinedInReg? Or extract SIFoldOperands::getImmOrMaterializedImm into SIInstrInfo

I chose to extract getImmOrMaterializedImm, maybe there's also room for improvement to clean up that getConstValDefinedInReg, but that's probably outside the scope of this PR

github-actions · 2026-01-21T11:27:12Z

🐧 Linux x64 Test Results

189206 tests passed
5015 tests skipped

✅ The build succeeded and all tests passed.

RyanRio · 2026-01-26T06:25:37Z

@arsenm Please merge on clean CI

mikaelholmen · 2026-01-26T13:08:34Z


-  return Def->getOperand(1).getImm();
+  // Now extract the base register (which should be a 64-bit VGPR).
+  MachineInstr *BaseDef = MRI->getVRegDef(BaseOp->getReg());


BaseDef is unused when building without asserts.

…toreOptimizer.cpp Fixes llvm#176816 (13b20e7).

Fix unused variable warning that fires in SILoadStoreOptimizer.cpp when assertions are disabled. On review, we can just delete the whole assert, since it isn't querying the def anymore. Fixes #176816 (13b20e7).

… async stores (#180220) `updateAsyncLDSAddress`, introduces by #176816, previously only handled async loads , where the LDS address is in the `vdst` operand. Therefore Async stores produced a nullptr dereference since the LDS address is in `vdata` for those instructions. --------- Co-authored-by: Jay Foad <jay.foad@gmail.com>

…etToImm for async stores (#180220) `updateAsyncLDSAddress`, introduces by llvm/llvm-project#176816, previously only handled async loads , where the LDS address is in the `vdst` operand. Therefore Async stores produced a nullptr dereference since the LDS address is in `vdata` for those instructions. --------- Co-authored-by: Jay Foad <jay.foad@gmail.com>

llvmbot added the backend:AMDGPU label Jan 19, 2026

arsenm reviewed Jan 20, 2026

View reviewed changes

RyanRio added 2 commits January 20, 2026 15:49

[AMDGPU][SILoadStoreOptimizer] Fix lds address operand offset

8867e4a

Add V_ADD_U64 pattern

e418da0

RyanRio force-pushed the users/rmitchel/fix_load_store_opt_async_lds branch from d8ebc76 to e418da0 Compare January 21, 2026 10:37

arsenm reviewed Jan 21, 2026

View reviewed changes

Refactor to use getImmOrMaterializedImm

8f14905

RyanRio requested a review from arsenm January 22, 2026 07:31

arsenm reviewed Jan 22, 2026

View reviewed changes

Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated

Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated

Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated

Comment thread llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp Outdated

Address review

72c9cb9

RyanRio requested a review from arsenm January 23, 2026 05:54

arsenm approved these changes Jan 24, 2026

View reviewed changes

Merge branch 'main' into users/rmitchel/fix_load_store_opt_async_lds

5d372ef

arsenm merged commit 13b20e7 into llvm:main Jan 26, 2026
11 checks passed

mikaelholmen reviewed Jan 26, 2026

View reviewed changes

googlewalt added a commit to googlewalt/llvm-project that referenced this pull request Jan 26, 2026

[AMDGPU][SILoadStoreOptimizer] Fix unused variable warning in SILoadS…

aa4a6ca

…toreOptimizer.cpp Fixes llvm#176816 (13b20e7).

googlewalt mentioned this pull request Jan 26, 2026

[AMDGPU][SILoadStoreOptimizer] Fix unused variable warning #177969

Merged

RyanRio deleted the users/rmitchel/fix_load_store_opt_async_lds branch January 27, 2026 07:56

AlexAUT mentioned this pull request Feb 6, 2026

[AMDGPU] Fix LDS address correction in promoteConstantOffsetToImm for async stores #180220

Merged

Conversation

RyanRio commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jan 19, 2026

Uh oh!

RyanRio commented Jan 19, 2026

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

RyanRio Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RyanRio commented Jan 26, 2026

Uh oh!

Uh oh!

mikaelholmen Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

RyanRio commented Jan 19, 2026 •

edited

Loading

github-actions Bot commented Jan 21, 2026 •

edited

Loading